Vulnerability Discovery in Open Source Libraries: Analyzing CVE-2020-11863 | McAfee Blogs
Open Source projects are the building blocks of any software development process. As we indicated in our previous blog, as more and more products use open source code, the increase in the overall attack surface is inevitable, especially when open source code is not audited before use. Hence it is recommended to thoroughly test it for potential vulnerabilities and collaborate with developers to fix them, eventually mitigating the attacks. We also indicated that we were researching graphics libraries in Windows and Linux, reporting multiple vulnerabilities in Windows GDI as well as Linux vector graphics library libEMF. We are still auditing many other Linux graphics libraries since these are legacy code and have not been strictly tested before.
In part 1 of this blog series, we described in detail the significance of open source research, by outlining the vulnerabilities we reported in the libEMF library. We also highlighted the importance of compiling the code with memory sanitizers and how it can help detect a variety of memory corruption bugs. In summary, the Address Sanitizer (ASAN) intercepts the memory allocation / deallocation functions like malloc () / free() and fills out the memory with the respective fill bytes (malloc_fill_byte / free_fill_byte). It also monitors the read and write to these memory locations, helping detect erroneous access during run time.
In this blog, we provide a more detailed analysis for one of the reported vulnerabilities, CVE-2020-11863, which was due to the use of uninitialized memory. This vulnerability is related to CVE-2020-11865, a global object vector out of bounds memory access in the GlobalObject::Find() function in libEMF. However, the crash call stack turned out to be different, which is why we decided to examine this further and produce this deep dive blog.
The information provided by the ASAN was sufficient to reproduce the vulnerability crash outside of the fuzzer. From the ASAN information, the vulnerability appeared to be a null pointer dereference, but this was not the actual root cause, as we will discuss below.
Looking at the call stack, it appears that the application crashed while dynamically casting the object, for which there could be multiple reasons. Out of those possible reasons that seem likely, either the application attempted to access the non-existent virtual table pointer, or the object address returned from the function was a wild address accessed when the application crashed. Getting more context about this crash, we came across an interesting register value while debugging. Below shows the crash point in the disassembly indicating the non-existent memory access.
If we look at the state of the registers at the crash point, it is particularly interesting to note that the register rdi has an unusual value of 0xbebebebebebebebe. We wanted to dig a little deeper to check out how this value got into the register, resulting in the wild memory access. Since we had the source of the library, we could check right away what this register meant in terms of accessing the objects in memory.
Referring to the Address Sanitizer documentation, it turns out that the ASAN writes 0xbe to the newly allocated memory by default, essentially meaning this 64-bit value was written but the memory was not initialized. The ASAN calls this as the malloc_fill_byte. It also does the same by filling the memory with the free_fill_byte when it is freed. This eventually helps identify memory access errors.
This nature of the ASAN can also be verified in the libsanitizer source here. Below is an excerpt from the source file.
Looking at the stack trace at the crash point as shown below, the crash occurred in the SelectObject() function. This part of the code is responsible for processing the EMR_SELECTOBJECT record structure of the Enhanced Meta File (EMF) file and the graphics object handle passed to the function is 0x80000018. We want to investigate the flow of the code to check if this is something which comes directly from the input EMF file and can be controlled by an attacker.
In the SelectObject() function, while processing the EMR_SELECTOBJECT record structure, the handle to the GDI object is passed to GlobalObjects.find() as shown in the above code snippet, which in turn accesses the global stock object vector by masking the higher order bit from the GDI object handle and converting it into the index, eventually returning the stock object reference from the object vector using the converted index number. Stock object enumeration specifies the indexes of predefined logical graphics objects that can be used in graphics operations documented in the MS documentation. For instance, if the object handle is 0x8000018, this will be ANDed with 0x7FFFFFFF, resulting in 0x18, which will be used as the index to the global stock object vector. This stock object reference is then dynamically cast into the graphics object, following which EMF::GRAPHICSOBJECT member function getType ( ) is called to determine the type of the graphics object and then, later in this function, it is again cast into an appropriate graphics object (BRUSH, PEN, FONT, PALETTE, EXTPEN), as shown in the below code snippet.
EMF::GRAPHICSOBJECT is the class derived from EMF::OBJECT and the inheritance diagram of the EMF::OBJECT class is as shown below.
However, as mentioned earlier, we were interested in knowing if the object handle, passed as an argument to the SelectObject function, can controlled by an attacker. To be able to get context on this, let us look at the format of the EMR_SELECTOBJECT record as shown below.
As we notice here, ihObject is the 4-byte unsigned integer specifying the index to the stock object enumeration. In this case the stock object references are maintained in the global objects vector. Here, the object handle of 0x80000018 implies that index 0x18 will be used to access the global stock object vector. If, during this time, the length of the object vector is less then 0x18 and the length check is not done prior to accessing the object vector, it will result in out of bounds memory access.
Below is the visual representation of processing the EMR_SELECTOBJECT metafile record.
While debugging this issue, we enable a break point at GlobalObjects.find () and continue until we have object handle 0x80000018; essentially, we reach the point where the above highlighted EMR_SELECTOBJECT record is being processed. As shown below, the object handle is converted into the index (0x18 = 24) to access the object vector of size (0x16 = 22), resulting into out of bounds access, which we reported as CVE-2020-11865.
Further stepping into the code, it enters the STL vector library stl_vector.h which implements the dynamic expansion of the std::vectors. Since the objects vector at this point of time has only 22 elements, the STL vector will expand the vector to the size indicated by the parameter highlighted, accessing the vector by passed index, and will return the value at that object reference, as shown in the below code snippet, which comes out to be 0xbebebebebebebebe as filled by the ASAN.
The code uses the std:allocator to manage the vector memory primarily used for memory allocation and deallocation. On further analysis, it turns out that the value returned, 0xbebebebebebebebe in this case, is the virtual pointer of the non-existent stock object, which is dereferenced during dynamic casting, resulting in a crash.
As mentioned in our earlier blog, the fixes to the library have been released in a subsequent version, available here.
Conclusion
While using third party code in products certainly saves time and increases development speed, it potentially comes with an increase in the volume of vulnerabilities, especially when the code remains unaudited and integrated into products without any testing. It is extremely critical to perform fuzz testing of the open source libraries used, which can help in discovering vulnerabilities earlier in the development cycle and provides an opportunity to fix them before the product is shipped, consequently mitigating attacks. However, as we emphasized in our previous blog, it is critical to strengthen the collaboration between vulnerability researchers and the open source community to continue responsible disclosures, allowing the maintainers of the code to address them in a timely fashion.